Conversation
Replace the API-only tool search with a fast local keyword/category search engine that eliminates API latency in the common case (~0.01ms vs 1-3s per query). Architecture: - 13 tool categories with keyword triggers and inverted indices built at module load time for O(1) token lookups - Multi-signal scoring: category boost, name/description index match, exact name match, action-verb alignment, and 40+ intent-based disambiguation rules - LRU result cache with 5-minute TTL (100 entries max) - Lazy OpenAI client — local mode never touches the network Search modes (TOOL_SEARCH_MODE env var): - hybrid (default): local first, falls back to API when confidence low - local: keyword-only, no API call - api: original GPT-based semantic search Also switches API fallback model from gpt-4.1-mini to gpt-5-nano (8x cheaper input costs). Accuracy on 180-case benchmark: 91.7% top-1, 100% top-3, 100% top-5.
- test_tool_search_local.py: offline benchmark (190-case dataset), latency tests (p99 < 5ms), and 62 creative prompt tests covering casual language, homework scenarios, physics/engineering, geometry constructions, statistics, graph theory, workspace ops, canvas ops, transforms, ambiguous terms, and edge cases - test_tool_search_service.py: 23 new unit tests for local search, cache, mode switching, category registry, and lazy client init; update existing API tests for mode-aware fixtures - test_tool_discovery_live.py: add search_ms and search_mode columns to CSV output for latency tracking - scripts/compare_search_modes.py: side-by-side comparison of search modes with disagreement analysis and CSV export
- Reference Manual: add ToolSearchService section with architecture, class methods, and environment variable documentation - README.md: add TOOL_SEARCH_MODE to configuration example - CLAUDE.md: add TOOL_SEARCH_MODE to .env configuration section - Project Architecture: update tool count and add tool discovery line
17 tests verify the full pipeline: natural-language prompt → real local tool search → real filtering → correct tool calls returned to the client. Only OpenAI API calls are mocked; _intercept_search_tools and ToolSearchService.search_tools_local run for real with TOOL_SEARCH_MODE=local. Streaming (14 tests): circle, triangle, derivative, solve, distribution, descriptive stats, graph, undo, save workspace, rotate, multi-tool, filtering of irrelevant tools, essential passthrough, no-search passthrough. Non-streaming (3 tests): o3 reasoning model, gpt-4.1 chat completion, chat completion with irrelevant tool filtering.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Search accuracy (180-case benchmark)
New tests
New tooling
scripts/compare_search_modes.pyfor side-by-side mode comparison with disagreement analysisConfiguration
Set
TOOL_SEARCH_MODEenv var:hybrid(default),local, orapiTest plan
static/tool_search_service.py🤖 Generated with Claude Code